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Abstract.  This  paper  is  concerned  with  the  approximate  solution 
of  stochastic  optimal  control  problems  which  arise  by  perturbing 
the  stochastic  linear  regulator  problem,  through  an  additive 
term  with  a small  parameter  6 in  the  drift  coefficient  of  the 
unperturbed  dynamical  equations.  The  system  states  are  assumed 
completely  observable.  Our  main  results  concern  expansions  of 
solutions  of  the  perturbed  equation  in  powers  6,6  ,6  , ...  of 
the  small  parameter  6. 
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1.  Introduction.  The  problem  of  optimal  control  of  Markov 


diffusion  processes  has  been  the  subject  of  a great  deal  of 
research  over  the  past  several  years.  See  for  instance  [FRl], 

[Bl] . However,  it  is  a difficult  matter  to  calculate  optimal 
feedback  control  laws,  except  for  the  linear  regulator  problem 
and  a few  other  special  cases.  In  this  paper,  we  consider  a 
nonlinear  perturbation  of  the  stochastic  linear  regulator,  which 
takes  the  form  of  a small  quantity  times  a certain  function,  and 
develop  a technique  for  computing  approximately  the  optimal  feed- 
back control.  The  system  states  are  assumed  completely  observable. 

Our  main  results  concern  expansions  of  solution  of  the  perturbed 

2 3 

problem  in  powers  6,6  ,6  , ...  and  the  validity  of  these  expansions. 

Part  of  this  problem  has  been  considered  by  Kolmanovskii  but 

under  strongest  conditions  than  we  consider  here.  See  [Kol] 

£ 

Consider  a stochastic  system  whose  state  £ (t)  is  an  n 
dimensional  vector,  which  satisfies  a stochastic  differential  equation 

d£6(t)  = (A(tK6(t)  + 6g(£6(t))  + B(t)u  (t) ) dt  + o (t)dw(t)  (1. 16) 

with  initial  data 


S6(s)  = x.  (1.26) 

Here  w is  a brownian  motion  process  of  some  dimension  d.  The 
system  state  £(t)  is  assumed  known  to  the  controller.  The 
control  u(t)  at  time  t is  a vector,  of  some  dimension 
using  a feedback  control  law  Y: 


k,  chosen 


2 


u (t)  = Y(t,£  (t) ) . (1.3) 

The  problem  is  to  find  among  all  Y e ^(Rt) , to  be  defined 
in  §2,  one  for  which  the  following  quadratic  criterion  of  expected 
system  performance  is  minimum. 


J6(s,x;u)  = Eg  x | L(t,e6(t)  ,u(t))dt 


(1.46) 


where  T denotes  the  terminal  time  and  L(t,x,u)  = x'M(t).x  + u'N(t)u. 
For  convenience,  we  use  the  notations 


f ' (t,x)  = A(t)x  + 6g(x)  + Bu  . 


When  6=0,  this  is  the  well  known  linear  regulator  problem,  for 
which  the  optimal  feedback  control  is  a linear  function  of  state. 
See  [FR1,  Section  6.5]. 


Y°*(s,x)  = -N~1(s)B’  (s)K(s)x. 


(1.5) 


Here  K(s)  is  a symmetric,  non-negative  definite  matrix  of  size 
n x n and  bounded  on  any  finite  time  interval. 

Let  <|>^(s,x)  denote  the  minimum  cost  function  and  consider 
it  as  a function  of  the  initial  data 


<t>  (s,x)  = inf  . J (s,x;Y). 
Y e &(R  ) 


(1.66) 


We  like  to  show  under  certain  conditions  that  <J>  satisfies  the 


partial  differential  equation  for  all  (s,x)  e [0,T]  x Rn 


<f>g  + \ trtao'if^}  + H<5  (s,x,<}>x)  = 0 


(1.76) 


together  with  the  data 


<f>  (T,x)  = 0 


(1.86) 


where  tr  is  the  trace  of  a square  matrix,  i.e.. 


surfer  - 


(|»x  denotes  the  gradient  of  in  the  variables  x = (xlf...,x  )' 
regarded  as  a row  vector  and 


H5Cs,x,P)  = min  . [L(s,x,u)  + P*f6,u(s,x)] 
ueK=R 


(1. 96) 


and  the  optimal  feedback  control  Y satisfies 


6 k 

y’N(s)y  + <J>x(s,x)  *B(s)y  = min  on  R 


(1.10) 


when  y = Y (s,x) . Thus,  the  completely  observable  optimal  problem 
is  in  principle  reduced  to  solving  the  Cauchy  problem  ( 1 . 7 5 ) - (1.85) 
for  <)>fi  and  then  minimizing  the  left-hand  side  of  (1.10)  over  Rk 
for  each  (s,x)  e [0,T]  x if1.  This  is  usually  difficult  to  do  in 
practice.  But  for  6=0,  it  is  well-known  that  the  solution  is 
(see  [FRlJ,  Section  5) 


<t>°(s,x)  = x*K(s)x  + q(s),  0 < s < T 

rT 

where  q(s)  = tr{aa'K}dt  and  the  corresponding  solution  of  (1.10) 
o*  S 

is  just  Y (s,x)  as  in  (1.5).' 

6 £* 

We  wish  to  find  (and  hence  also  Y°  ) approximately 

in  terms  of  quantities  computable  from  We  seek  that  the 

following  type  of  expansions  hold  uniformly  for  (s,x)  is  any 
compact  set: 


+ 

0 

-e- 

li 

601  + 629i  + ...  + 6k0k  + o(6k). 

(1.11) 

II 

-©- 
X 0 

+ 

60lx  + 6l02x  + + 6k0kx  + °<6k)* 

(1.12) 

The  coefficients  in  (1.11)  must  have  the  property  that  k!9k  is 
th  6 

the  k derivative  of  <J>  with  respect  to  6 when  6 = 0.  Hence 

they  satisfy  the  equations  found  by  formally  differentiating  (1.7  ) 

repeatedly  with  respect  to  6 and  setting  6=0.  These  equations 

involve  the  partial  derivatives  of  and  <J>°  of  corresponding 

orders.  Whether  such  expansions  are  available  will  depend  on 

smoothness  properties  of  H which  will  be  guaranteed  by  the 

assumptions  in  §2.  Suppose  we  use  the  optimal  unperturbed  policy 
o* 

Y in  the  perturbed  problem.  We  like  to  know  how  close  to  the 

6 o* 

optimum  is  the  performance  J (s,x;Y  ) in  perturbed  problems. 

Our  method  will  also  answer  this  equation. 

The  following  notations  are  used 

0 0 
C (H  — If  r is  an  open  set,  we  write  g e C (r ) to  mean  that 

the  function  g together  with  its  partial  derivatives  of  orders 

j = !,...,£  are  continuous  on  r . If  r is  not  open,  then 
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Jl  o 

g e C (T ) means  that  g agrees  on  r with  a function  h e CT(r') 

where  I”  is  open  and  r C T'. 

1 2 

C ' (Q)  — It  has  the  same  meaning  as  above  except  g is  twice 
continuously  differential  with  respect  to  x and  continuously 
differentiable  with  respect  to  t. 

Cp(Q)  — It  denotes  the  class  of  all  continuous  functions  ip  which 
satisfy  a polynomial  growth  condition  on  Q,  i.e.,  for  some 
positive  constants  c,mf  |iKt,x)|  < c(l+|x|m)  when  (t,x)  e Q. 

(&) 

Cp  (Q)  — It  means  the  class  of  functions  which  together  with  its 
derivatives  up  to  order  & satisfy  polynomial  growth  condition 
on  Q. 


We  begin  in  section  2 by  discussing  assumptions  on  f,L,  &(k) 
and  then  get  some  preliminary  results  about  the  existence, 
uniqueness, boundedness  of  the  moments  of  £ and  some  properties 
of  H . In  section  3,  the  existence  and  uniqueness  of  solutions 
of  dynamic  programming  equations  are  proved.  Then  we  use  a 
verification  theorem  to  show  that  the  solution  is  <p  . In 
section  4,  we  give  the  approximation  method  and  prove  it  is  valid 
and  finally  we  discuss  the  goodness  of  Y°  in  the  perturbed 
problem. 

2.  Assumptions  and  Preliminary  Results.  The  following  assumptions 


are  made. 


(AI)  A (t)  ,B (t)  ,M (t)  and  N(t)  are  bounded  C°°  matrix- 
valued functions  with  size  n x n,  n x k,  n x n,  k x k 
respectively.  M(t)  is  a semi-positive  and  N(t)  is  positive 
definite. 

(AH)  g e Cp  ^ (Rn)  and  gx(x)  is  a matrix-valued  function 
with  diagonal  elements  bounded  above  and  off-diagonal  elements 
bounded . 

(AIII)  There  exist  positive  constants  constants 

al'a2  independent  of  6,  and  a positive  C°°  function  V(x)  such 
that 


(a) 

~ tr{a  (t)  a'  (t)V  (x) } + V (x)  • (A(t)x  + 

« XX  X 

<5g  (x) ) < M^l+Vfx)) 

(b) 

(l+|x|)  |vx(x)  I < M^l+Vfx)) 

(c) 

V (x)  -»•  «>  as  | x | ->-oo 

(d) 

cx1  + M2  | x | 2 < V (x)  < a2  + M3|x|2 

(AIV) 

There  exist  positive  constants  c1»c2 

such  that 

(a) 

| a (t,x)  | < c1(l+|x| ) 

and  for  all  v e Rn 


(b) 


l (a(t)u' (t) ) . .v.v. 

irj=l  1 3 


This  means  that  noise  enters  directly  into  each  component  of  the 
system.  The  corresponding  dynamic  programming  equation  is 
uniformly  parabolic.  This  enables  us  to  apply  result  about  para- 
bolic partial  differential  equation. 


7 


Under  the  assumptions  of  (AIII)  and  (AV) . We  have  the 


existence  and  uniqueness  of  solution  n (t)  of  free-control  system. 


i.e.,  let  u - 0 in  (1.1  ).  Moreover,  for  any  positive  integer  m. 


there  exists  a positive  constant  Cm  depending  on  t,M1,M2,l 


such  that,  see  [F3] , [Kl] , [k2] 


Es,xln(t)|m  < cji+lxf) 


(2.1) 


If  K is  compact  subset  of  R^,  then  is  clearly  well-defined 


by  (1.9  ).  Let  u = V(s,x,p)  be  the  unique  vector  in  K at  which 
H is  a minimum  on  K for  every  (s,x,p)  belonging  to  an  open 
set  r of  R2n+1.  Then  if  r is  a set  such  that  V e C*(r),  we 


have  H5  e CA+1  (D . Moreover,  Hg  = Lg  + Pf  , H = L + Pf  , hs  = f, 
see  [FI]  . When  k = R , we  can  see  easily  that  H is  a C°° 


function  and 


V (s,x,P)  = 


- j N_1B'P  ' 


(2.2) 


Let  K be  a closed  convex  subset  of  RK  containing  0 as 


an  interior  point.  Let 


Q = [ 0 , T]  x R 


Q = [0,T]  x { |x|  < r},  r = 1,2, 


we  define  by  S^(K)  the  set  of  all  function  Y such  that 
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(a)  Y(s,x)  e K for  all  (s,x)  e Q 

(b)  When  (s,x),(s,y)  e Qr  with  0 < s < T'  < T 

| Y (s ,x)  -Y(s,y)|  < c*r|x-y|  (2.3) 

(c)  For  all  (s,x)  e Q,  |Y(s,x)|  < g(l+|x|) 

\ _ 

The  positive  consants  ar,  g may  be  different  for  different 

' ! 

functions  Y.  ar  may  also  depend  on  T’. 

Lemma  2.1.  Conditions  (.2.3)  insure  the  existence  and  uniqueness 
of  the  process  £ (t)  in  (1.1°) , given  the  control  Y and  the 
initial  data  (1.2^). 


Proof.  Using  the  same  V as  in  (AIII) , we  have 

j tr{aCt)a' Ct)Vxx(x)  } + Vx  (x)  • (A(t)x  + 6g(x)  +B(t)Y(t,x)) 

< Mj^Cl+VCx))  + 3 (1+  | x | ) |Vx(x)  | 

< M^l+g)  (1+V)  . 

6 5 (5 

Hence  the  solution  £ of  (1.1  ) with  (1.2°)  exists.  Uniqueness 
comes  from  (2.3b).  Q.E.D. 

Furthermore,  there  exists  a positive  constant  C'  such  that 

n\ 

E Max  | C6  (t)  |m  < Cl (1  + |x|m).  (2.4) 

' s<t<i 

JP 

The  next  lemma  is  an  estimate  on  E |£°(t)|. 


p 
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Lemma  2.2.  (.t)  | < CJU+|x| ) where  is  a positive 

constant  independent  of  K. 


Proof : Since  0 e K,  by  (1.6  ) we  have 


6 rT 

<P  (s,x)  < Eg,x  J n' (t)M(t)n(t)dt 


where  n(t)  is  the  solution  of  free  control  system.  Because  M(t) 
is  bounded.  There  exists  a constant  61  > 0 such  that 


6 fT  ? 

<f>  (s,x)  < 61Es^x  j | n (t)  rdt. 


By  (2.1)  we  have 


<J>6(s,x)  < $2  (1+ 1 x | 2 ) 


r 

for  a positive  constant  $2*  From  (1.6  ) and  positive  definite 
of  N (t ) we  have 


Es,x  jT  I u<5*  Ct)  |2dt  < ^ (l+|x|2) 


(2.5) 


where  the  positive  constant  y satisfies 


p'N(t)y  > Y | pi 


for  all  y e R . 

Now  subtract  the  equation  governing  free  control  system  from 

6 ^ it 

(1.1  ) with  u = u . Using  the  mean  value  theorem,  we  obtain 


-n)  ct)  - (A(t)  + 6 f g (n+X(?6  -n))dA)(C5  -n)  ct)dt  + B(t)u5*(t)dt 

J o 


with  (£5  -g)  Cs)  - 0 where  is  the  solution  of  (1.1*5)  with 

5* 

u = u.  Let 


gxCx)  = G^  (x)  + G2(x) 


where  G^  (x)  is  a diagonal  matrix-valued  function  whose  diagonal 
elements  are  those  of  g (x)  and  G_ (x)  is  the  same  as  g 
except  diagonal  elements  are  all  zero.  Let 

Ax(t)  = A (t ) + 5 J G2(n+XU6*-n))dX. 


By  assumptions,  A^ Ct)  is  still  a bounded  matrix-valued  function. 
Let  X2(t,v)  be  the  principal  matrix  solution  of  the  following 
equation  at  initial  time  v, 


dXx(t,v) 
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£ * 

G,  (n+X  (E  -n  ) ) dx  X.,  (t , v)  dt. 
0 x 1 


Since  the  elements  of  G^  are  bounded  above,  X^  (t,u)  is  bounded 
for  s < v < t.  Using  the  variation  of  constants  formula,  we  get 

T 

U6  -n)  (t)  = I X1  (t,v)  [A., (v)  (?6*-n)  (v)  + B(v)u6*  (v)  ]dv. 

' s 


Then 
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. T 

U6*Ct)  - n Ct)  | 2 < 2 1 J x1  C^v^  (v)  C?6*(v)  - n(v))dv 


+ 2 I I X1  (t,v)B(v)u6*(v)dv 


t: 


Taking  expectation  E and  by  (2.5),  Cauchy-Schwartz  and 

b f A 

Gronwall's  inequalities,  we  have 


Es,x|56*(t)  " Tl(t)  I2  < 83  (1+ 1 x | 2 ) 


where  $3  is  a positive  constant.  Hence 


Es,xUSV>|2  i 2 Es,xls0'(t)  - n(t)  I-1  + 2 Es_xln(t)|2 
< B4(1+|x|2) 

where  $.  is  a positive  constant.  Since  (E  |^5*(t)  |p) 

* S f X 

nondecreasing  as  p increases,  we  obtain 


.6* 


Es,xl?  (t)  ! < C^1+M> 


where  is  positive  constant  not  depending  on  K.  Q.E.D. 


is 


Let  X2 (t,v)  be  the  principal  matrix  solution  at  initial 
time  v of 


5* 


dx2(t,v)  = 6 G]L  (.5°  (t))X2(t,v)dt. 

By  assumption  on  G1,X^(t,v)  is  bounded  for  s < v < t.  Let  W 


be  the  solution  of 


dW(t)  = (A (t)  + 6 g (£6*(t)))W(t)dt 

with  WCs)  = identity  matrix.  Again,  using  the  similar  technique 
as  before,  we  can  show  that  W(t)  is  bounded  and  the  bound  does 
not  depend  on  x. 

This  next  lemma,  a modification  of  Lemma  V.5.2  of  [FR1] , 
is  concerned  with  the  probabilistic  representation  for  solution 
i//(s,x)  of  a linear  partial  differential  equation 

Vs  + \ tr{aa>xx>  + ^x*f  + g(s,x)i|»  + h(s,x)  = 0. 

Lemma  2.3.  Let  ip  be  a solution  of  the  above  equation  in 

10, T]  x Rn  with  iMt,x)  = 'l'(x),  suppose  that  ^,h,4*  belong  to 
1 2 

C ' (Q)  and  g is  bounded  'and  continuous  on  Q,  then 

(T 

i^Cs,x)  = E D(u}h(u,£  (u)du  + E D (T)  f (£  (T)  ) 

J c s , X 


where 


D (u)  = exp  [ gtv,£(v))dv 
‘ s 

Proof.  Consider  ipD  in  the  proof  of  cited  lemma.  Q.E.D. 

3.  Dynamic  Programming  Equation.  Let  & denote  the  set  of  all 
non-negative  real  valued  function  i|i  on  Q such  that 
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CD  the  partial  derivatives  tj;  ,it  ,i|>  „ , i,j  = l,...,n 

X x'  J 

are  continuous  on  Q and  satisfy  a Holder  condition  on  each 
compact  subset  of  Q. 

(ii)  <J>  e 0^  CQ) 

(iii)  iJj(T,x)  = 0 for  all  x. 

We  seek  a solution  in  of 

♦ g + j trCao'tJ;^}  + H5(s,x,^x)  = 0 (3.1) 

with  the  Cauchy  data  t|>(T,x)  = 0*  If  <PS  is  such  a solution,  let 
6* 

Y be  defined  by  (1.10).  Since  the  fxrsb  term  on  the  left-hand 

side  of  (1.10)  is  quaratic  in  y and  the  second  term  is  linear  in 

6 * 6 * 

y,  (1.10)  uniquely  determines  Y . The  function  Y clearly 
satisfies  (2.3a).  By  a similar  proof  as  for  (F.2  Theorem  2.2)  it 
satisfies  (2.3b),  we  shall  prove  later  that  (2.3c)  holds. 

The  following  theorem  is  quoted  from  [FR1] . It  tells  us 
that  the  existence  of  each  <(>^  and  Y^*  imply  a solution  to  the 
minimum  problem.  Let  r be  an  open  subset  of  Q and  3*r  be  a closed 
subset  of  the  boundary  of  r such  that  (t,£°  (T)  e 9*F  with 
probability  1,  for  every  choice  of  initial  data  (s,x)  e T and  every 
admissible  control,  where  t is  the  first  exit  time.  Let 


J (s,x; Y) 


... 


e (t) ,u(t))dt. 


[Verification  Theorem]:  Let  \J>(s,x)  be  a solution  of  (3.1)  with 


the  boundary  data  \p(_sfx)  = 0 for  (s,x)  e a*r  such  that  4, 

12 

is  in  C ' CT ) and  continuous  on  the  closure  r,  then 

(.a)  iKs,x)  < J(s,x;Y)  for  any  admissible  feedback  control 
Y and  any  initial  data  (s,x)  e T. 

Cb)  If  Y*  is  an  admissible  feedback  control  such  that 
Cl. 10)  is  satisfied  when  Y = Y*(s,x),  then  i/;(s,x)  = J(s,x;Y*). 


This  Y*  is  optimal  among  all  admissible  feedback  control  laws, 
for  all  choices  of  initial  data  (s,x)  c r. 

Let  us  now  show  that  there  is  a unicy^e  solution  in  3^  of 
C3.1).  This  will  be  done  by  approximation  in  two  stages.  In  the 
first  step  we  assume  K is  a compact  set  containing  zero  as  an 
interior  point.  Let  xr  denote  the  first  time  t < T when 
U6Ct)  | = r,  if  | C 6 (t ) | < r for  s < t < T,  we  set  xr  = T, 

X 

where  £ (s)  = x and  |x|  < r.  Let 


j£(s,x;Y) 


E 

s,  x 


r 

L(t,£°(t)  ,u(t))dt 
s 


(3.; 


As  r -*  oo,  xr  increase  to  T.  Since  L is  non-negative,  the 
monotone  convergence  theorem  implies  that  Jr(s,x;Y)  tends  to 
J (s,x; Y) . For  r = 1,2,...,  let 


X 

<(>r( s,x)  = min  J (s,x;Y)  , 

Ye^(k)  r 

then  0 <*?<*«  < since  0 e K,  we  have  d*5  < J®(s,x;0). 

A ^ " r " r 

X 

From  (2.1)  and  J°(s,x;0)  < J(s,x;0) 


we  have 
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<J>£(s,x)  < (1+ 1 x | 2 ) 


for  some  positive  constants  v^.  Let 


(j)5  (s,x)  = lim  (s,x) 

r-^oo 


(3.3) 


(3.4) 


Clearly,  <J>  satisfies  (3.3)  too.  By  Theorem  VI  6.1  of  fFRl]  and 
the  verification  theorem,  for  each  r,<f>£  is  a solution  of  (3.1) 
with  the  boundary  data  <J>°  = 0 on 


Er  = ( [0,T]  x { | x | = r } ) U ( { T } x {|x|  < r}). 


In  order  to  show  that  <p  also  belongs  to  3*q  , we  need  to 
establish  a uniform  bound  on  any  compact  set  for  the  gradients 

<*r>x* 


Lemma  3.1.  Let  B be  a compact  subset  of  Q , then  (<j>  ) is 

r0  r x 

bounded  on  B uniformly  with  respect  to  r > rQ. 


Proof.  With  <P  = <Pr  in  Lemma  5.3,  p.  494  of  [FI],  we  have 


l 

Mr>x(S'Xl  = Es  x | ^Wdt  + 


+ Es,x(^1xllr'!i,Trll“'V 


(3.5) 


where  Tr  is  the  exit  time  from  Qr  with  Y = Y*,  the  optimal 

control  law  corresponding  to  Since  (<J>6)  (T;  ?6(T))  = 0, 

r r x 

l^xl  - aiL  + a2  f°r  suitable  ci^,^  and  W is  bounded,  we  have 
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I (<frr)x(s,x)  I < 0^(8, x)  + a2(T-s)  + max  I xlPttr  < T} 


Let  Nr  be  a number  such  that 


<J>r  (s,x)  < Nr  (r-  | x | ) 


whenever  Ixl  < r.  Then 


6 6 

(4>r)x(s,x)|  < a1<()r(s,x)  +a2(T-s)  + NrP{ir  < T}.  (3.6) 


In  order  to  show  that  Nrpr^Tr  < T)  is  uniformly  bounded  with 

respect  to  r > rQ,  we  have  to  estimate  Nr-  Given  x take  xc 

o 

with  |x  | = r , (x-x°|  = r - |x|.  Let  v = - ~ , we  construct 
a barrier  0 at  (s,x°)  as  follows 


0 (s,x)  = eT_S (1-e  r 


~k  v (x-x  ) 


where  kr  is  the  positive  root  of 


crk  “ Mrk  " 1 = 0 


where  Mr  is  a bound  of  |At  + 6g(x)  + BY  | whenever  |x|  < r 
and  c2  is  defined  in  (AIV) . By  straightforward  calculation,  we 


0S  + j tr{aa'0xx)  + 0x* (Ax  + fig  + BY*)  < -1 


■ ■ - tv  > iSrfi-AiwiY^itonii  ■ • 


have 


and  0 > 0 on  Qr.  By  maximum  principle, 

<J>6(s,x)  < (Max  LY*)0. 
r " Qr 

Moreover,  since  0Cs,x°)  = 0 and  r - |x|  = |x-x°|, 

0 Cs,x)  < (Max  | 0 | ) (r-|x| ) 

Qr 

0 (s,x)  < eTkr (r- |x| ) . 

, # £ 

Since  K is  compact,  k^.  < D^(l+r)  for  some  positive  constant  D 

and  LY  < D2(l+r)^,  therefore 

<|>£(s,x)  < D3(l+r)2+£ 

for  some  D3  > 0.  We  take  Nr  = D3(l+r)  . Finally, 

P(x  < T}  < r~AE  Max  |^6(t)|A. 

S,X  s< t<T 

By  (2.4),  we  have 

p(xr  < T}  < r“AC" (1+ | x | A ) , 

where  does  not  depend  on  r.  If  we  take  X > 2 + l and 

recall  (3.3),  this  proves  the  lemma. 

Standard  estimates  for  second  order  parabolic  equations  (see 
[Frl] , P.  60,  65,  191)  and  passages  to  the  limit  then  imply  the 

JP 

desired  properties  of  <J>  . The  technical  details  of  the  argument 


18 


are  similar  to  the  proof  of  Theorem  VI. 6.2,  of  [FRl] . 

x 

We  have  shown  that  $ e 5^  and  Y*  e 3^(k)  . Hence  by  the 

6 * 
verification  theorem,  <j>  (s,x)  is  the  minimum  values  of  J°(s,x;Y). 

X 

The  following  lemma  is  a probabilistic  representation  of  <f>  . 


Lemma  3.2.  <f>^(s,x)  = E [ L (t,£^  (t)  ,u*  (t)  )W(t)dt,  where  W(t) 

J s 

is  defined  in  Section  2. 


Proof.  By  C3.6),  NrRrtxr  < T)  -+■  0 are  r -*■  °°  and  = lim  <f> 5 , 

r-+oo 

we  have 


'£\  < a-^6  + a (T-S). 


Thus,  using  the  boundedness  of  W,  we  obtain 


Es,x^x(Tr^6(xr))W(xr)l  - const*  (1+r)  2P  ( t r < T) 


Hence 


-*■  0 as  r -*■ 


x j. 

By  Lemma  5.3  of  [FI]  on  <p  and  t T as  r -*■  we  have 


p *T  p ^ 

<t>°ts,x)  = Es,x  Lx(t,C°  (t),u*(t))W(t)dt.  Q.E.D. 

s 


From  Lemma  3.2  and  Lemma  2.2,  there  exists  a positive 
constant  v independent  of  K such  that 


|<l>£cs,x)  | < v(l+|x| ) . 


(3.7) 


Let  us  now  consider  the  case  K = R , For  m = 1,2,...,  let 


Km  = { | Y | < m}. 


Consider  the  corresponding  H5m  in  (3.1)  and  solution  found 

61  62 

by  the  previous  method.  Then  <{>  > 4>  > ...  > 0.  Let 


,6  . . ,6m 

<f>  = lim  <p 


(5xn 

By  (3.3)  and  (3.7),  <p  and  4>x  are  uniformly  bounded  on  each  compact 
set.  Since  H^m  tends  to  as  m ->  «>,  the  same  reasoning 

X 

indicated  right  after  the  proof  of  Lemma  3.1  shows  that  e 

and  satisfies  (3.1).  It  remains  to  show  the  corresponding  optimal 

6*  i k 

control  policy  Y satisfies  (2.3c)  and  hence  belongs  to  y(R  ). 
m 6m* 

Let  x = Y be  the  optimal  control  function  corresponding  to 
Then  Ym  tends  to  Y^  as  m °°.  We  want  to  estimate 
| Ym | . Given  (s,x)  , let  M(y)  = y'Ny  + <f>^*By.  Then  M is 


minimum 


of  Km  for  Y = Y1”  = Ym(s,x)  . Since  0 is  an  interior 


point  of  K,  we  have 


M CYm)-Ym  = M (zYm) |z=1  < 0. 


Therefore, 


2Ym,N(.t)Ym  + <))x*BYm  < 0. 
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[ 


l-  '' 


i 


11  1 ' 


Since  B(t)  is  bounded,  we  have  for  some  positive  constants  v 


I*"!  « v2|*J| 


By  (3,5}  then 


|Ym|  < v,(l+|x|) 


where  v3  does  not  depend  on  Km.  Therefore 


Theorem  3.1.  The  function  <f>6(s,x)  defined  by  (1.66)  belongs  to 


X X 

y*Q  and  satisfies  (3.1).  The  function  Y (s,x)  defined  by  (1.10) 


belongs  to  i^(Rk) . Thus  Y6*  is  optimal. 


Actually  (f)*4 5  is  as  smooth  as  we  want  (C°°)  , since  H5  is 


oo 

C and  also  the  Cauchy  data.  See  [Frl] . 


6 ,6 


4.  Asymptotic  Formulas  for  <j>  ,<p 


We  are  now  ready  to  consider  the  expansions  of  solution  of 

tho  perturbed  problem  in  terms  of  the  solution  of  unperturbed 
problem.  At  the  end  of  this  section  we  also  indicate  how  the 
methods  tell  the  goodness  of  the  policy  Y0’'  in  the  perturbed 


problem.  Since  A(t) ,B(t) ,M(t)  and  N(t)  arc  C functions, 


then  <f><S,4>°,Y5  and  Y°*  are  c”  functions  too. 


Lemma  4.1.  $*(s,x)  ->  <j>°(s,x). 


Proof . Let  ,Y°  be  -the  controls  corresponding  to 


. <5*  o* 

respectively  and  £ ,K  be  the  corresponding  Markov  processor 


iJ 
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respectively  (given  the  same  initial  data  (s,x)).  Let  4,? 
be  the  solutions  of 


dC(t)  = (A(t)S(t)  + 6g(4(t))  + B(t)Y°*  (t,4°*  (t)  ) )dt  + a ( t ) dw  ( t ) 
d£  (t)  = (A(t)  t,  (t)  + B(t)Y6*  (t,46*  (t) ) )dt  + a (t)  dw(t) 


with  initial  data  4 (s)  = £ (s)  = x.  Suppose  X(t,v)  is  the 
• , dX 

principal  matrix  solution  at  initial  time  v of  -rr-  = A(t)X,  then 

at 

wc  have 


c. 

S6>’  - c * 6 | X(t,v)g(f,5v  (v))dv 


r £ 

then  it  is  easy  to  see  4 c in  probability  as  6 ->  0, 
Similarly  v:e  have  4 -►  4°  in  probability  as  6 -*■  C.  By 
definition  of  we  have 


<p  (s,x)  = J ( s , x ; Y°  ) < J ° ( s , x ; Y°*)  , 
<J>°(s,x)  = J°(s,x;  Y°* ) < J°  (s,x;  Y ) . 


These  imply 


r <5  * V -rO 


Ju(s»x;  Yy  ) - J u(s,x;  y0")  < rf/  - <f>°  < Jl'(s,x;  Y°*) 


**  d°  ( s / x ; Y°*  ) , 


i • g « / 
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E. 


s,x 


f ^ 

(Z  (t)M(t)  £°*  (t)  - C'  (t)M(t)  C (t)  ]dt  < - <f° 


.5* 


JL 

- Es,x  j t{' (t)M(t)Ut)  - C°*  ' (t)M(t)  c°*  (t)]dt. 


Since  ES/XU6*'m5S*  - CMU2  and  ESjX<5'M£  - 50*'m£°V  are 
bounded  and  the  bounds  do  not  depend  on  5,  we  use  Lebesgue 
dominated  convergence  theorem  to  get  the  result.  Q.E.D. 

Lemma  4.2.  <f>x(_s,x)  ->  <)>°(s,x)  uniformly  on  any  compact  set. 


j \ 


Proof.  Since  in  (3.7)  v is  independent  of  6,  (3.7)  implies 
that  4>x  is  uniformly  bounded  on  any  compact  set.  By 
Theorem  3.1,  <p  e Moreover,  we  know  <f>  c C°°.  Hence 

is  equ i continuous  on  any  compact  set.  By  As call ' s theorem,  there 

6 

exists  a subsequence  which  converges  uniformly  to  a 

limit  Lot  us  show  that  C = 4°  . Since 

X 


X 

I 


xi 


X . 

ox 


<J>  dx . -► 
x . 1 

l 


C . dx . 
1 x 


x . 
ox 


and  using  Lemma  4.1 


x 


. n , 

♦ x.dxi 
x 


« 11  ( s , ) 


- (J > n(s,>:oi)  -»■  <'°(s,xi)  - <.°(sfxoj) 


ox 


rJ‘ncn  using  fundamental  of  calculus,  we  have  r . - <f.°  and 


hence  the  lemma.  Q.E.D. 


xi  xi 
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_ 6*  ^o* 

C S m probability  as  6 •»  0.  Q.IJ.D. 


We  now  consider  formula  Cl. 11)  with  k = 1. 

Lemma  4.4.  $6  = ^ + o (6)  where  6_10  (6)  ->•  o as  6 ->  0 

uniformly  on  any  compact  set  and  satisfies 


<°l)8  + J?  tr{a°'  (0i)xx)  + (01)>..f°'Y°  + <j)® . cj  = 


0 (4.3°) 


v/ith  the  initial  data  0^  (T,x)  = 0. 

Proof . We  can  show  0^(s,x)  lias  following  form 

T 

°1(D'X)  - Es,x  j 


For  6 > 0,  let 


0?  = 6 1 (<J>6-<,°)  . 


By  (1.7^)  and  (1.7°)  , 0^  satisfies 


(0«)s  + | tr{oo.(0«)xj!}  + (0«)  -f 


6, 


6 * o* 
Y°  -f-Y° 


, j ° 

+ V* 


= 0. 


(4.3°) 


Let  c bo  the  solution  of 


6, 


Y6\y°" 


d;°(t)  = f 


( tw  C *" ) dt  + oft)  dw 


with  initial  data  ;6  (s)  = x.  Since  ^ (Y6*  + y°*)  e % the 
solution  t, 6 exists  which  is  unique  in  the  usual  sense  and  all  of 
its  moments  are  bounded.  Similar  to  the  proof  of  Lemma  4.3, 


we  can 
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6 o* 

show  that  C £ in  probability  as  6 -*■  0.  Also  by  Lemma  2.3, 


61(S'X)  " F's,x  . 


S 


where  <f>x - g satisfies  polynomial  growth  condition,  i.e., 
Es,xl*2tt''6tt))-9(C6(t))  I2  is  bounded  independent  of  6.  Hence 


JL 

lira  E0  v I 4>?(t,r/(t)>-gU6lt))dt 
; -►  o s'x  J * 


T 


- E 


s ,x 


*®(t#C°(t))  -ij(C0(t))dt. 

A 


Thun  0^  ->  01  as  6 -►  0.  The  convergence  is  uniformly  on  any 
compact  set.  Hence  the  lemma  is  proved.  Q.E.H.. 

Now  we  consider  formula  (1.12)  with  k = 1. 

Lemma  4.5.  0,.  = <P  + 6(0.)  + o(6)  where  6 "^o(6)  •>•  0 as 

A X J.  X 

6 -►  0 uniformly  on  any  compact  set. 

Proof . It  is  equivalent  to  show  (0?)  (0.)  as  6 ->•  0. 

J-  X 1 X 

Using  (<.36),  (oJ)y  satisfies 

l ‘i 


(oL.>s  + 1 tr{o“'<°Li)J<x  + 

+ <°L  >x-£5,y6*  + <»?>x<:Y°‘  + «v*>x.  - 


(4.56) 


0 
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with  0,  CT,x)  - 0.  Let  W be  the  principal  matrix  solut; 


at  initial  time  s of 


dW5  = f6'Y  (t,£6*(t))W6dt. 


In  fact,  A + 6g^  - BN  B'K.  Using  the  similar  technique 

of  proof  boundedness  of  W we  can  show  that  W6  is  bounded  and 
the  bound  does  not  depend  on  x.  Similar  as  in  Lemma  3.2  we  can  show 


(°i)x(y,x)  - ES/X  | U»°-g)xw5  dt. 

s (t,£6*(t)) 


(4.66) 


Since  moments  of  *( t)  are  bounded  and  e C , then 

X X p 


X 

6 

lirn  (0  ) (s,x)  — E (d0>q)  v.,c>  dt  (4  7\ 

6 -+  0 1 x s,x  J vsx  J V 

s t,£°*(t) 


where  W is  the  principal  matrix  solution  at  initial  time 


dW°  = (A  - EN_1B'K) W°dt. 

It  is  easy  to  see  the  right  hand  side  of  (4.7)  is  just 
(^^)^,(s,x).  Indeed/  from  (4.3*^)  we  have. 


<V>S  + 5 tr{oa'(0lX.>xx}  + <0lx.VC 

1 1 


°’Y  + (0.)  .f»'Y 

lx  X . 


+ *g)v  = o. 

A i 
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By  a similar  procedure 


(ei>x<s'x>  = E*,x  j (4x‘'j)xW°|  dt- 

s t ,&*(U) 


U 

Hence  (0^)^  (0^)^  as  6 •*  0 uniformly  on  any  compact 


set.  Q.E.D. 


Lemma  4.6.  + 60^  + 6^02  + °(6^)  wliere  5 ‘0(6^)  -♦■  0 

as  5-^0  uniformly  on  any  compact  set  and  0 2 is  defined  by 


<°2>s  + I tHca' te2)Xx)  + I02>x^0'r°‘  + | 0,H 


0, 

pp  lx 


+ ( 0 1 ) . . • cj  = 0 


(4.0°) 


with  initial  data  02(T,x)  = 0. 


Proof.  Let 


°2  " <5"1(0^-0;|  ) - 6~2  ((j><S-^°-601) 


r 1* 

Then  the  problem  is  equivalent  to  0°  -►  02  as  5 -►  0.  By  (4.3°), 


(4.3°),  92  satisfies 


<$  * o * 
Y°  +Y 


(02>s  ♦ | trfoa- <o|)xx>  + (o|)x-f 


(4.86) 


+ 5 <°l>xllppK>x  + <Vx^  - 


Using  Lemma  2.3, 


(s,x)  = E 


+ 0lx(t,^6(t))-g(c6(t)))dt. 


(4.96) 


5 o* 

Since  C -►  4 in  probability  as  6 + 0 and  .by  (4.6°),  (4.G<5) 
we  can  see  that  both  (0  ) and  (G?)  belong  to  C , i.c., 

J-  ^ J-  x p 

the  integrand  of  (4.S  ) also  beongs  to  C , hence 

P 


1 im  0 ~ ( s , x ) •= 
6 -*■  0 * 


E 

s,x 


T 

| [i0ix(t)  )>'ppolx  ) 

s * 

+ 0lx(t,^(t)).g(t°(t))]dt 


and  the  right  hand  side  is  just  02(a,x).  In  fact  from  (4.8°) 


JL 

02(s,x)  = E0iX  | (i  elx(t.5°*(t))llpp0lx(i, 


(4.9°) 


o* 


+ 0lx(t,£  (t)).g(r(t)))dt. 


Therefore,  ®2  as  ^ "*■  0 uniformly  on  any  compact 

set.  Q.E.D. 

We  can  continue  the  procedure  and  finally  we  have 

Theorem  4.1.  The  expansions  of  (1.11),  (1.12)  are  valid  for  any  k 
to  1 and  hold  uniformly  on  any  compact  set. 


Corollary  4.1.  Y6*(s,x)  =Y°*(s,x)  - | N_1 (s)B' (s)  (0. ) ' (s,x) 

-k  , 

" ’**  ” 2~  N (s) B'  (s)  (Gk)x  (s,x)  + o(6k)  where  k < Z. 


Proof . Use  (2.2)  and  Theorem  4.1.  Q.E.D. 


o* 

Now  we  consider  goodness  of  Y (s,x)  in  perturbed  problem. 

o* 

By  Corollary  4.1  we  know  Y gives  approximately  the  optimal 

control  policy  in  the  perturbed  problem  for  small  5.  It  is 
• o* 

also  plausible  that  Y should  give  approximately  the  optimum 
in  the  perturbed  problem.  The  above  lemmas  and  their  method  of 
proof  put  this  rough  statement,  on  a quantitative  basis.  Let 

4>6  ( s , x ) = J5  (s,x;  Y°* ) . 

In  particular,  4>°(s,x)  = <f>°(s,x).  For  6 > 0,  <J'^(s,x)  - <>^(s,x) 

o* 

represents  how  much  Y fails  to  be  optimal  in  the  perturbed 

* 

problem.  It  is  known  that  4>  e y and  satisfies  the  linear 
parabolic  equation 


(*6)s  + \ tr(oa’^x)  . f ^ ^ + x.tNb:  H.  Yw  kyw  =-•  0 (4.10) 


,o* ’ o* 


JP 

with  initial  data  4>  (T,x)  = 0.  Let  us  write 


= <f,°  + 5X  + <52v  + o((52). 


(4.11) 


By  the  same  procedure  as  before,  we  have  for  k --  1,2 

(xk)s  + i tr<™’(xk)xx>  + (xk)x-£°'v°  + (xk.px-g  - o (4.12) 


where  xQ  = <P° . Let  x*  = S"1  (I'*-*0)  , X-2  <$“2  (4>6-<i°-6x1)  • 


30 


Hence  for  k - 1,2 


<Xk>s  + I + (x«,  .£{'y' 


Then 


X1  " °1' 


i 

*2  “ Es,x/  0lx(t'£°<t))  •9(C°(t))dt. 


By  the  some  procedure,  we  can  prove  + X±>  X2  X2  as  6 -►  0 

uniformly  on  any  compact  set.  By  comparing  with  Lemma  4.6,  we 
find  that 


* (s,x)  - p " (s,:<) 


(4..13) 


1 ,2P 

~ 7 4 E: 


0,v(t,{°(t))H°  + o(42). 

' / S*  J x pp  .1.  /• 


Torraula  (4.13)  shows  that  Y°  giver;  within  order  l he  square  n, 
the  intensity  of  perturbation  6 of  the  optimum. 


ib:aia,ole.  Let  (,  be  the  solution  of  the  scalar  Ito  equation 

dr/‘  - (-6  (£<'1)  + u(t))dt  -i-  odv; 

6 

with  C (x)  = x.  The  control  set  is  R.  The  criterion  of 
performance  is 


J(s,x;  u) 


;S2  + u2iat. 


By  [FR1,  Section  6.5],  we  have 


<J>°(s,x)  = K (s)  x2  + q ( s ) 


where  K(s)  = tanh(T-s)  and  q(s)  = o2£n  cosh(T-s).  By 


Theorem  4 . 1 


f 

>,  (s,x)  - -2  I k ( t) E £0,,(t)dt 

) 5 t 


= -2  (31(c)xiJ  + e2(s)x“  + e3(s)) 


6j.  (s)  ~ 4 (1  i > 

cosh  (T-s) 


B2(s) 2 (tanh(T-s)  (cosh4  (T-s)  - 1) 

4cosh  (T-s) 

- ~ sin  *1  (T-s)  + -“--) 


B3(r.)  = 6o4  (tar.h2  (T-s)  (cosh4  (T-s)  - 1) 

- tanh(T-s)  (--  sin4  (T-s)  - --"--) 


+ h (1-cosh2  (T-s)  + 


2 (cosh4 (T-s) -1) ) 

2 cosh' (T-s) 
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ACCOMPANYING  STATEMENT 


PERTURBED  STOCHASTIC  LINEAR  REGULATOR  PROBLEMS 

by 

Chun-Ping  Tsai 


The  theory  of  optimal  feedback  control  of  Markov  diffusion 
processes  has  been  well  developed.  However,  it  is  a difficult 
matter  to  calculate  optimal  feedback  controls,  except  for  the 
linear  regulator  problem  and  some  other  special  cases. 

In  this  paper  a nonlinear  perturbation  of  the  stochastic 
linear  regulator  is  considered.  An  algorithm  is  given  for 
computing  approximately  the  optimal  feedback  control,  if  the  non- 
linearity appearing  in  the  nonlinear  stochastic  differential 
equations  governing  the  system  is  a polynomial  in  the  state 
variables.  Under  appropriate  assumptions  on  the  nonlinearity,  the 
method  is  justified  in  a mathematically  rigorous  way.  The 
quantities  which  need  to  be  computed  to  find  the  optimum  approximately 
can  be  expressed  in  terms  of  higher  order  moments  of  a known 
gaussian  process,  namely  the  state  process  for  the  optimum  linear 
regulator. 


